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■ Abstract 

^ ■ The low-rank matrix factorization as a Li norm minimization problem has recently attracted much attention 



due to its intrinsic robustness to the presence of outliers and missing data. In this paper, we propose a new method, 
called the divide-and-conquer method, for solving this problem. The main idea is to break the original problem 
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00 ■ into a series of smallest possible sub-problems, each involving only unique scalar parameter. Each of these sub- 

^SJ ■ problems is proved to be convex and has closed-form solution. By recursively optimizing these small problems 

o: 

(N. in an analytical way, efficient algorithm, entirely avoiding the time-consuming numerical optimization as an inner 



loop, for solving the original problem can naturally be constructed. The computational complexity of the proposed 
algorithm is approximately linear in both data size and dimensionality, making it possible to handle large-scale 
Li norm matrix factorization problems. The algorithm is also theoretically proved to be convergent. Based on 
a series of experiment results, it is substantiated that our method always achieves better results than the current 
state-of-the-art methods on Li matrix factorization calculation in both computational time and accuracy, especially 
on large-scale applications such as face recognition and structure from motion. 
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I. Introduction 

In recent years, the problem of low-rank matrix factorization (LRMF) has attracted much attention due 
to its wide range of applications in computer vision and pattern recognition, such as structure from motion 
[15], face recognition [16], shape from varying illumination [8], and object tracking [1]. Representing the 
measurements or the observation data as a x n matrix X = (xi,X2,-- - ,x„), whose columns XjS 
correspond to the rf-dimensional input measurements and n is the number of input items, the aim of the 
LRMF can be mathematically described as solving the following optimization problem: 

minllX-f/V^II, (1) 

Uy \l II ' 

where U = (ui, U2, ■ ■ ■ , u^.) G R'^^'', V = (vi, V2, ■ ■ ■ , v^) G i?"^'^ and k < d,n. To deal with the real 
LRMF problems in the presence of missing data, the optimization ([T]) is also reformulated as 

minllW Q(X -UV^)\\, (2) 
u,v " " 

where denotes the component-wise multiplication (i.e., the Hadamard product), and the element Wij of 
the denotation matrix 14^ G -R''^" is 1 if the corresponding element of X is known, and otherwise [2]. 
Here ||-|| is some form of the matrix norm. 

The global minimum of the optimization problem ([T]) with L2 matrix norm (i.e., the Frobenius norm) 
can easily be solved by the well known singular value decomposition (SVD, [7]) method. To handle 
missing data, some methods, such as the Wiberg algorithm [13] and the weighted low-rank approximation 
method (WLRA, [14]), have further been proposed to solve the optimization ^ with L2 matrix norm. 
The performance of these techniques, however, is sensitive to the presence of outliers or noises, which 
often happen in real measurements, because the influence of outliers or noises with a large norm tends 
to be considerably exaggerated by the use of the L2 norm [3], [11]. To alleviate this robustness problem, 
the often used approach is to replace the L2 matrix norm with the Li norm in the objective functions of 
© and ^ [5], [4], [9], [11]. The models are then expressed in the following forms: 

min||X-f/y^|L (3) 

Uy \\ IILi 
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and 

mm\\W Q(X -UV^)\\^ . (4) 

u,v " "-f^i 

Unfortunately, it turns out that replacing the L2 norm with Li norm in the optimizations makes the problem 
significantly more difficult [5]. First, both ([3]) and dH) are non-convex problems, so their global optimality 
are in general difficult to obtain. Second, both optimizations are non-smooth problems conducted by the 
Li matrix norm, so it is hard to attain an easy closed-form iteration formula to efficiently approximate 
their solutions by standard optimization tools. Third, in real applications, both optimizations can also be 
very computationally demanding problems to solve, which always limit their availability in large-scale 
practice. 

In this paper, by employing an important algorithm design paradigm, namely divide and conquer, we 
formulate efficient algorithms against the optimization models © and (HI), respectively. The core idea 
underlying the new algorithms is to break the original optimizations into a series of smallest possible 
problems and recursively solve them. Each of these small problems is convex and has closed-form solution, 
which enables the new algorithms to avoid using a time-consuming numerical optimization as an inner 
loop. The proposed algorithms are thus easy to implement. Especially, it is theoretically evaluated that 
the computational speeds of the proposed algorithms are approximately linear in both data size and 
dimensionality, which allows them to handle large-scale Li norm matrix factorization problems. The 
efficiency and robustness of the proposed algorithms have also been substantiated by a series of experiments 
implemented on synthetic and real data. 

Throughout the paper, we denote matrices, vectors, and scalars by the upper-case letters, lower case 
bold-faced letters, and lower-case non-bold-faced letters, respectively. 

II. Related work 

Various approaches have recently been proposed to deal with the optimizations ^ and (H)) to achieve 
robust low-rank matrix factorization results. For the Li norm model ([3]), the iteratively re-weighted least- 
squares approach introduced by Torre and Black is one of the first attempts [3]. Its main idea is to iteratively 
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assign a weight to each element in the measurements. The method, however, is generally very sensitive 
to initialization ([9]). Instead of the Li matrix norm. Ding et al. utilized the rotational invariant Ri norm, 

n d 

as defined by = X^Zl^^p^^^ for the objective function of © (ICML06, [4]). Like the Li norm, 

the -Ri norm so defined is also capable of softening the contributions from outliers. By substituting the 
maximization of the Li dispersion of data, ||[/^X||i^, for the minimization of the original Li objective, 
IIX — f/\^^|| , Kwak presented another approach for the problem (PAMI08, [11]). The method is also 
able to suppress the negative effects of outliers to a certain extent. The most predominance of this method 
is its fast computational speed, which is linear in both measurement size and dimensionality. 

Two methods represent the current state of the art of solving the model dH). The first method is presented 
by Ke and Kanade, who formulated the robust Li norm matrix factorization objective as alternative 
convex programs (CVPR05, [9]). The programs can then be efficiently solved by linear or quadratic 
programming. The second method is designed by Eriksson and Hengel, which represents a generalization 
of the traditional Wiberg algorithm (CVPRIO, [5]). The method has been empirically proved to perform 
well on some synthetic and real world problems, such as the structure from motion (SFM) applications. 
It should be noted that both methods can also be employed to solve ([3]) by setting all elements of the 
missing data denotation matrix to be Is. 

III. DiVIDE-AND-CONQUER METHOD FOR ROBUST MATRIX FACTORIZATION 

Unlike the previous methods for robust matrix factorization, the proposed divide- and-conquer (D&C in 
brief) method chooses to solve the smallest possible sub-problems of ^ and dH) at every step (involving 
only one scalar parameter of U ox V). The advantage of the new method lies in the fact that each small 
sub-problem so attained can be solved analytically. Thus, complicated numerical optimization techniques 
are entirely avoided, and the overall problem can thus be efficiently solved. We introduce our method and 
its theoretical fundament as follows. 



A. Breaking the model into smallest sub-problems 

We first consider the optimization model ([3]). Since the fc-rank matrix UV^ can be partitioned into the 

k 

sum of k 1-rank matrices, i.e., UV'^ = Xl^j'^j^' ® can thus be equivalently reformulated as 



mm 



k 

The original /c-rank matrix factorization problem can then be decomposed into a series of recursive 1-rank 
sub-problems: 

min \\Ei - Ujvf 11 , (6) 



(5) 



where Ei = {e\,e\,--- , e^) = X — Zl^i'^J- We can naturally approximate the solution of ^ by 
sequentially solving @ with respect to (uj, v.^) for z = 1, 2, ■ ■ ■ , /c, with all other (u^ , Vj)s (j 7^ i) fixed. 

Solving @ can further be simplified to alteratively optimizing Uj or Vj while letting the other fixed. 
Since Uj and can be solved in a completely symmetrical way (in the sense that \\E — uv^||^^ = 
\\E'^ — vu-^ll ), we only need to consider how to efficiently solve 

min llEi - Ujvf II . (7) 

By reformulating (|7]) to its decoupling form: 



n 

mm 



where Vij is the j-th element of the vector v,, the problem can then be further divided into n small 
sub-optimizations with the following expression (for j = 1, 2, ■ ■ ■ , n): 

min lie* - Uit;i,|| . (9) 

From ([5]) to (|9l), we have broken the original large optimization ([3]), with respect to U and V, into a 
series of smallest possible optimization problems, each with respect to only one scalar parameter of U or 
V . By utilizing the similar strategy, it is also easy to decompose the large optimization dH) into a series 
of small optimizations, expressed as: 

min (e* - ViiVij)\\ ^ , (10) 



where wj is the j-th column of the denotation matrix W. 

It is very fortunate that both small optimizations ^ and (flOl ) are not only convex, but also have closed 
form solutions. This implies that it is possible to construct fast algorithms for dS]) and (U), as introduced 
in the following discussion. 

B. The closed form solutions of ^ and diOl) 
We first formalize ^ as: 

min /e,u(^^) = ||e - ut;||^ (11) 

where both e and u are (i-dimensional vectors, and denote their i-th elements as and Ui, respectively. The 
following theorem shows the convexity of this optimization problem (the proofs of all involved theorems 
are moved to the supplementary material due to the page limitation). 

Theorem 1: fe,u{v) as defined in (fTTI) is a convex function with respect to v. 

Theorem 1 implies that it is hopeful to find the global optimum of (fTTI) . We first clarify the case when 
all elements of u are positive in the following lemma. 

Lemma 1: For (fTTI) . assuming each element Ui of u is positive (ui > 0,i = 1,2, d), denote 
• the label set Le,u = {^i^'^K ' ' ' ' "^)' permutation of (1, 2, ■ ■ ■ ,d) based on the ascending 
order of ( — , ^, 



fid V 



d d id 

, the sequence Te,u = {ao^ai,- ■■ ,ad)- ao = -^Uii^.^), CLd = ^u^i.,^), tti = ^u^i.,^)- Yl 

j=i ^ j=i ^ j=i ^ j=i+i J 

l,2,...,ci-l; 

• the label ie.u^ the label of the first non-negative element of Fg^u; 
and the following closed form expression provides a global optimum of (fTTf) : 



P(e,u):=^, r = /t:). (12) 



Ui 



It is easy to deduce that T^ ^ is a monotonically increasing sequence, and Oq < 0, > 0. Thus, the 
label ie,u can be uniquely found from the sequence. 
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Algorithm 1: D&C algorithm for solving min \ \X — UV'^\ 



Given: X = (xi,X2,-- - ,x„) e R'^''" 
Execute: 

1. Randomly initialize (7™ = {u[°\ uf\ ■ ■ ■ , uf^ ) G = (vi"', vf , ■ • ■ , vf ) G i?"^*; 

For t = 1, 2, ■ • ■ , convergence 

2. Letuf =uf-^',vf)=vf-i\i=l,2,...fe. 

For i = 1,2, 

^ ft) (t)T 

3. Compute Ei — X ~ ' denote the column and row vectors of Ei as (el, e2, ■ ■ ■ , e'J and 

(el,e^, ■ • ■ ,e'a), respectively. 

4. Let I.**' = Q(ej, uf ') for j = 1, 2, ■ • • ,n based on (Hi, and then update = (i-'^' , w'a' . ' ' ' > «in 

5. Let = Q(ej, vf ) for j = 1, 2, ■ ■ ■ , d based on GDl, and then update = (^ta^ 'ii2'> ' ' ' > ^^id 
End For 

6. Update = (u(*',uW,... ,uW), = (v'*', v(*\ • • ■ , v^). 
End For 



The above theorem gives a closed form solution for ([TT]) under positive vector u. Next theorem further 
gives the solution of (fTTI) in general cases. 
Theorem 2: For (fTTI) . denote 

• the label set lu = (ii, 22, ■ ■ ■ ; < d): the labels of the nonzero elements of u; 

• 5 1 ' ' ' 1 ^* ' ^e,u ifiii 1 ^*2 1 ' ' ' 1 ' 

• 'Tu=sign{Tu) Tu, \f'e,u=sifi'?^(Tu) ^I/e,u» where sign{-) is the signum function; 
and the following closed form expression provides a global optimum of (fTT|) : 

Q(e,u) :=P(^e,u,T,), (13) 

where the function -P(-, ■) is defined as (fT2l) . 
We then consider (fTOl) . First formalize it as 

min U,eA^) = II w (e - u?;)||^ , (14) 
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4. Let vlf = Q(wj e] , Wj ) for 

5. Let = Qiwj 5} , vf ) for 


i = l,2,■ 
j=l,2,• 


■ ■ ,n based on 

■ ■ ,d based on 


(list, and then update v'*^ = (n'^^ , n'j' , • ■ 
M5\, and then update u'*' = (u'j' , u^j' i ' ' 


' "id • 



where w, e, u are all (i-dimensional vectors. Since 

II w (e - Mv) II = II (w e) - (w u)v) \\ , 
(fT4l) can then be seen as a special case of (fTTI) in the sense that 

It thus holds the following theorem based on Theorems 1 and 2. 

Theorem 3: (fT4l) is a convex optimization problem with closed form solution 

g(w0e,w0u), (15) 

where Q{-,-) is defined as (fT3l) . 

By virtue of the closed form solutions for the small optimization problems (l9l)/(fTTI) and (fT0l)/(fT4l) given 
by Theorems 2 and 3, respectively, we can now construct fast algorithms for solving the original large 
robust matrix factorization problems ^ and (H]). 

C. Fast algorithms for robust matrix factorization 

We first consider the D&C algorithm for the optimization model The main idea of our algorithm 
is to sequentially update each element of U and V. In specific, the algorithm iteratively updates each 
element of Uj and v, for i = 1,2, ■ ■ ■ , k, with other UjS and VjS (j ^ i) fixed, in the following way: 

. Update each element Vij (j = 1,2, - ■ ■ ,n) of Vj under fixed u^: the optimal value Vij is attained 
through the following closed form expression based on Theorem 2 

Vij = Q{e], Mi) = argmin ||e* - Uii;jj||^^ , 
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where e* is the j-th column vector of the representation error matrix 

Ei = X- J]u,vJ. (16) 

. Update each element Uij (j = 1,2, d) of Ui under fixed v^: the optimal value of Uij is achieved 
through 

Uij = Q{e'j, Vj) = argmin ||e* - ^iUij\\^ 

based on Theorem 2, where e* denotes the j-th row vector of Ei. 
Through implementing the above iterations from i = 1 to k, the low-rank factorized matrices U and V 
can then be recursively updated until the termination condition is satisfied. We embed the aforementioned 
D&C technique into Algorithm 1. 

The D&C algorithm for solving (HI) is very similar to Algorithm 1, the only difference is the updating 
of each element of Uj and Vj, which is summarized as follows: 

• Update each element t>jj (j = 1, 2, ■ ■ ■ , n) of Vj under fixed u^: the optimal value Vij is solved through 

Vij = Q{-Wj e*, Wj Uj) 

= arg min 1 1 w^- (e* - UiVij) 1 1 

based on Theorem 3, where Wj is the j-th column vector of W, and e* is the j-th column vector of 
the representation error matrix Ei (defined as (fT6l)). 

• Update each element Uij (j = 1, 2, ■ ■ ■ ,d) of Uj under fixed Vji the optimal value of Uij is attained 
by 

Uij = Qi^j e}, Wj Vj) 

= arg min 1 1 (e* - ViUij ) 1 1 

Uij ■' "-^1 

based on Theorem 3, where Wj denotes the j-th row vector of W , and e* is the j-th row vector of 
E,. 
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Since the D&C algorithm for solving (U) differs from Algorithm 1 only in steps 4 and 5, we only list 
these two steps of the algorithm in Algorithm 2. 

The remaining issues are then on how to appropriately specify the initial U and V in step 1, and 
when to terminate the iterative process in steps 2-6 of the proposed algorithms. In our experiments, we 
just randomly initiated each element of U and V, and the proposed algorithm performed well in all 
experiments under such simple initialization strategy. As for the termination of the algorithms, since the 
objective function of dH) or (H]) decreases monotonically throughout the iterative process (see details in the 
next section), the algorithms can reasonably be terminated when the updating extent \\U^^^ — U^^"^^]] (or 
||\/(*) _ is smaller than some preset small threshold, or the process has reached the pre-specified 

number of iterations. 

D. Convergence and computational complexity 

We now discuss the convergence of the proposed algorithms. It should be noted that in steps 4 and 5 of 
Algorithm 1/Algorithm 2, the global minimum of the objective function of ©/(HI) with respect to Vj and 
Uj (with other VjS and u^s fixed) is analytically obtained based on Theorem 2/Theorem 3, respectively, and 
thus in each of the iteration steps of the algorithm, the objective function of the problem is monotonically 
decreasing. Since it is evident that both objective functions of ([3]) and dH) are lower bounded (> 0), the 
algorithm is guaranteed to be convergent. 

The computational complexity of Algorithm 1/Algorithm 2 is essentially determined by the iterations 
between steps 4 and 5, i.e., the calculation of the closed form solutions of Vj and Uj. To compute the 
global optimum for each element Vij of Vj (j = 1, 2, ■ ■ ■ , ra) in step 4 of the algorithm, the closed form 
expression P(-, •), as defined in Lemma 1, is utilized, which costs 0{d\ogd) computation to obtain the 
label set Le,u by applying the well-known heap sorting algorithm [10], and costs at most 0{d) computation 
to seek the label of the first nonzero element ie,u from the sequence re,u- Altogether, calculating the 
global optimum for each Vij needs around 0{d\ogd) computational time. Updating the entire Vj thus 
requires about 0{nd\ogd) computational cost. It can similarly be deduced that updating Uj in step 5 needs 
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around 0{nd\ogn) computational cost. The entire computational complexity of Algorithm 1/Algorithm 
2 is thus about 0{k{ndlogd + ndlogn)) x T, where T is the number of iterations for convergence. 
That is, the computational speeds of the proposed algorithms are approximately linear in both the size 
and dimensionality of the input measurements X, as well as its intrinsic rank k. Such computational 
complexity makes the use of the proposed algorithms possible in large-scale Li norm matrix factorization 
problems, as demonstrated in the following experiments. 

IV. Experiments 

To evaluate the performance of the proposed D&C algorithms on robust matrix factorization, it was 
applied to various synthetic and real problems with outliers and missing data. The results are summarized 
in the following discussion. All programs were implemented under the Matlab 7.0 platform. The imple- 
mentation environment was the personal computer with Intel Core(TM)2 Quad Q9300@2.50 G (CPU), 
3.25GB (memory), and Windows XP (OS). 

A. Experiments on data with outliers 

Three series of experiments were designed to evaluate the performance of the proposed Algorithm 1 
on data with intrinsic outliers (for solving the optimization model ([3])). The details are listed as follows: 

Small synthetic experiment El: Containing 100 synthetic 30 x 30 matrices, each with intrinsic rank 3. 
Each matrix was first generated as a product UV'^ , where U and V are independent 30 x 3 matrices, 
whose elements are i.i.d. Gaussian random variables with zero mean and unit variance; and then 10% 
elements of the matrix were randomly picked up and transformed into outliers by randomly assigning 
them values in the range of [-40,40]. 

Large synthetic experiments E2,E3,E4: Containing 3 synthetic 7000 x 7000 matrices, each with intrinsic 
rank 3. Each was first generated from the product UV^ , where U and V are independent 7000 x 3 matrices, 
whose elements are i.i.d. Gaussian random variables with zero mean and unit variance, and then different 




Fig. 1. From top to bottom: Typical Yale B face images, the corresponding images corrupted with 20% outliers, and the faces reconstructed 
by the SVD, the PAMI08, and the proposed D&C methods, respectively. 

extents of outliers were assigned to randomly selected 10% elements of the original matrices, with ranges 
[-40,40], [-400,400], and [-4000,4000], in E2, E3, and E4, respectively. 

Face recognition experiments E5,E6: The input data are composed by 256 face images, each with pixel 
size 192 X 168, i.e., the matrix is of the size 32256 x 256. The images were first extracted from subsets 
1-4 of the Extended Yale B database [6], and then were corrupted with 20% and 50% of dead pixels with 
either maximum or minimum intensity values in E5 and E6, respectively. Typical images are depicted in 
Figures [T] and [21 

In each of these experiments, the original un-corrupted matrix, denoted as X, is saved as ground truth 
for comparison purpose. 

For comparison, 5 of the current methods for low-rank matrix factorization, including SVD, ICML06 
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Fig. 2. From top to bottom: Typical Yale B face images, the corresponding images corrupted with 50% outliers, and the faces reconstructed 
by the SVD, the PAMI08, and the proposed D&C methods, respectively. 

[4], PAMI08 [11], CVPR05 [9], and CVPRIO [5], have also been utilized. Except SVD, which need not be 
initialized, and ICML06, which requires the SVD initialization [4], all of the utilized methods employed 
the similar initialization for each involved experiment. The rank k was preset as 3 in E1-E4 and 20 in E5 
and E6 for all methods. The performance comparison of these methods is shown in Table U (that of El is 
shown as the average result over 100 experiments). In the table, / means that the corresponding method 
on the experiment could not be completed in reasonable time. For easy observation. Figures [U and [2] 
demonstrate some of the original and reconstructed images in E5 and E6, respectively. The reconstruction 
is implemented by the product of t/V^, where the low-rank matrices U and V are the outputs of the 
corresponding matrix factorization method. 

The advantage of the proposed Algorithm 1, as compared with other utilized methods, can evidently be 
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Computational time (s) 


Accuracy: " ^^^^^^ ([12]) 


El 


0.0045 


0.022 


0.0014 


0.835 


411.0 


0.048 


7.67 


7.64 


5.95 


0.0693 


0.0221 


3.57 X 10"'* 


E2 


137.31 


/ 


139.06 


/ 


/ 


7612 


0.025 


/ 


0.030 


/ 


/ 


7.08 X 10"*'' 


E3 


146.13 


/ 


159.60 


/ 


/ 


6953 


3.91 


/ 


3.24 


/ 


/ 


2.04 X 10"*^ 




119.85 


/ 


189.54 


/ 


/ 


7279 


293.4 


/ 


222.9 


/ 


/ 


4.39 X 10"" 


E5 


33.41 


/ 


92.64 


/ 


/ 


7335 


0.124 


/ 


0.117 


/ 


/ 


0.0312 


E6 


59.90 


/ 


234.78 


/ 


/ 


7275 


0.384 


/ 


0.338 


/ 


/ 


0.0959 



TABLE I 



The performance comparison of the 5 current matrix factorization methods and the proposed D&C method in 
experiments e1-e6. in each cell, the values from the left to the right refer to the svd, icml06, pami08, cvpr05, 
cvprio, and d&c methods, respectively. the best result in each experiment is highlighted. x denotes the original 
un-corrupted matrix (ground truth), and u and v denote the outputs of the corresponding matrix factorization 
method. / means that the corresponding method on the experiment could not be completed in reasonable time. 

observed from Table Uin robust matrix factorization calculation. Specifically, our method attains the highest 
computational accuracy in all of the involved experiments. For Yale B experiments E5 and E6, it is very 
interesting that some latent features underlying the original faces can be extracted from the reconstructed 
images, as clearly depicted in Figures \T\ and [21 Even more interesting is that these reconstructions are 
obtained from the corrupted but not the original images. The new method is thus potentially useful for 
latent feature extraction from noisy measurements in real applications. Another merit of our method is 
that it has stable performance on different extents of outliers, which can evidently be observed in the 
E2-E4 results, in which the reconstructed low-rank matrix attained by our method is always extremely 
close to the ground truth. Although the computational speed of the proposed algorithm is slower than the 
SVD and PAMI08 methods in the experiments, considering that both SVD and PAMI08 are not designed 
against the Li norm matrix factorization model Q, the efficiency of the proposed method is still dominant 
in the methods against Q, especially in large-scale cases. 
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B. Experiments on data with outliers and missing components 

We also implemented three series of experiments to evaluate the performance of Algorithm 2 on data 
with intrinsic outliers and missing components (for solving the optimization model Q). The details are 
summarized as follows: 

Small synthetic experiment E7: Containing 100 synthetic 20 x 30 matrices, each with intrinsic rank 3. 
Each matrix was first generated as a product UV'^ , where U and V are 20 x 3 and 30 x 3 matrices, 
respectively, whose elements were generated from the Gaussian distribution with zero mean and unit 
variance. Then 5% of the elements were selected at random and designated as missing by setting the 
corresponding entry in the matrix W to zero. To simulate outliers, uniformly distributed noises over 
[—5, 5] were additionally added to 10% of the elements of the matrix. 

Large synthetic experiment E8: Containing one synthetic 10000 x 700 data matrix, with intrinsic rank 
40. The matrix was first generated as a product UV'^ , where U and V are 10000 x 40 and 700 x 40 
matrices, randomly generated from the Gaussian distribution with zero mean and unit variance. Then 20% 
and 10% of the elements were selected at random and designated as missing components and outliers 
(randomly chosen in [—5,5]), respectively. 

Structure from motion experiments E9,E10,E1 1: The structure from motion (SFM) problem can be posed 
as a typical low-rank matrix approximation task [5], [9]. In this series of experiments, we employ two 



well known SFM data sequence, the dinosaur sequence, available at , http://www.robots.ox.ac.uk/~vgg /, and 
the pingpong ball sequence, available at |http://vasc.ri.cmu.edu/ic[b7l for substantiation. The entire dinosaur 
and pingpong sequence contain projections of 4983 and 839 points tracked over 36 and 226 frames, 
respectively, composing 4983 x 72 and 839 x 452 SFM matrices correspondingly. Each matrix contains 
more than 80% missing data due to occlusions or tracking failures. As considering robust approximation 
in this work, we further include outliers uniformly generated from [—5000, 5000] in 10% components 
of two matrices to form the input data of the experiments ElO and Ell, respectively!!! Since some other 



The components of both SFM matrices are also approximately located in [—5000, 5000]. 
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Computational time (s) 


Accuracy: (E7,E8), " (E9-E11) 


E7 


61.50 


24.02 


3.53 


1781 


0.221 


6.5864 


15.0713 


0.3274 


0.3051 


0.2626 


E8 


134070 


/ 


/ 


/ 


90766 


0.0130 


/ 


/ 


/ 


4.9173 X 10"^'^ 




2.9447 


127.37 


24.93 


132.79 


10.071 


0.4539 


0.7749 


0.0426 


0.0405 


0.0031 


ElO 


202.49 


/ 


13788 


/ 


61.917 


0.4462 


/ 


0.3385 


/ 


0.0765 


Ell 


224.91 


/ 


718.82 


/ 


70.950 


0.3498 


/ 


0.0903 


/ 


0.0151 



TABLE II 

The performance comparison of the 4 current matrix factorization methods and the proposed D&C method in 
experiments e7-e1 1. in each cell, the values from the left to the right refer to the wlra, wlberg, cvpr05, 
cvprio, and d&c methods, respectively. the best result in each experiment is highlighted. x denotes the original 
un-corrupted matrix, and u and v denote the outputs of the corresponding matrix factorization method. / means 
that the corresponding method on the experiment could not be completed in reasonable time. 

robust matrix factorization methods cannot be made available at such data scales (see Table In]), we further 
picked up 336 points from the dinosaur sequence to form a smaller 336 x 72 matrix, and also added 10% 
outliers to it to compose the input measurements of the experiment E9. 

As the experiments E1-E6, the original un-corrupted matrix, denoted as X, is saved as ground truth in 
each experiment for comparison purpose. 

Four current low-rank matrix factorization methods were employed for comparison. They include the 
WLRA [14] and Wiberg methods [13], which are typical methods designed for the L2 norm model ©, 
the CVPR05 [9] and CVPRIO [5] methods, which are current state-of-the-art methods for solving the 
Li norm model dH). All of the utilized methods adopted similar initialization for each of the involved 
experiments. The performances of these methods are compared in Table HI] (that of E7 is depicted as the 
average result over 100 experiments). 

The advantage of the proposed D&C method is evident based on Table [III in terms of both computational 
speed and accuracy. On one hand, our algorithm always attains the most accurate reconstruction of the 
original data matrix by the product of the obtained low-rank matrices U and V, and on the other hand, the 
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computation cost of the proposed algorithm is the smallest of all employed methods in most experiments 
(except being the second smallest in E9). It is very impressive that the computational speed of the proposed 
algorithm is even faster than the WLRA and the Wiberg methods, which are constructed for L2 norm 
matrix factorization model, in most cases (except slower than WLRA in E9). Considering the difficulty of 
solving the Li model due to its non-convexity and non-smoothness, the efficiency of the proposed method 
is more prominent. 

V. CONCLUSION 

In this paper we have tried a new methodology, the divide and conquer technique, for solving the Li 
norm low-rank matrix factorization problems ^ and (H)). The main idea is to break the original large 
problems into smallest possible sub-problems, each involving only one unique scalar parameter. We have 
proved that these sub-problems are convex, and have closed form solutions. Inspired by this theoretical 
result, fast algorithms have been constructed to handle the original large problems, entirely avoiding 
the complicated numerical optimization for the inner loops of the iteration. In specific, we have proved 
that the computational complexity of the new algorithms is approximately linear in both the size and 
dimensionality of the input data, which enables the possible utilization of the new algorithms in large- 
scale Li norm matrix factorization problems. The convergence of the new algorithms have also been 
theoretically validated. Based on the experimental results on a series of synthetic and real data sets, it 
has been substantiated that the proposed algorithms attain very robust performance on data with outliers 
and missing components. As compared with the current state-of-the-art methods, our algorithms exhibit 
notable advantages in both computational speed and accuracy. The experimental results also illuminate 
the potential usefulness of the proposed algorithms on large-scale face recognition and SFM applications. 
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